9 research outputs found

    Automatic Data Extraction Utilizing Structural Similarity From A Set of Portable Document Format (PDF) Files

    Get PDF
    Instead of storing data in databases, common computer-aided office workers often choose to keep data related to their work in the form of document or report files that they can conveniently and comfortably access with popular off-the-shelf softwares, such as in Portable Document Format (PDF) format files. Their workplaces may actually use databases but they usually do not possess the privilege nor the proficiency to fully utilize them. Said workplaces likely have front-end systems such as Management Information System (MIS) from where workers get their data containing reports or documents.These documents are meant for immediate or presentational uses but workers often keep these files for the data inside which may come to be useful later on. This way, they can manipulate and combine data from one or more report files to suit their work needs, on the occasions that their MIS were not able to fulfill such needs. To do this, workers need to extract data from the report files. However, the files also contain formatting and other contents such as organization banners, signature placeholders, and so on. Extracting data from these files is not easy and workers are often forced to use repeated copy and paste actions to get the data they want. This is not only tedious but also time-consuming and prone to errors. Automatic data extraction is not new, many existing solutions are available but they typically require human guidance to help the data extraction before it can become truly automatic. They may also require certain expertise which can make workers hesitant to use them in the first place. A particular function of an MIS can produce many report files, each containing distinct data, but still structurally similar. If we target all PDF files that come from such same source, in this paper we demonstrated that by exploiting the similarity it is possible to create a fully automatic data extraction system that requires no human guidance. First, a model is generated by analyzing a small sample of PDFs and then the model is used to extract data from all PDF files in the set. Our experiments show that the system can quickly achieve 100% accuracy rate with very few sample files. Though there are occasions where data inside all the PDFs are not sufficiently distinct from each other resulting in lower than 100% accuracy, this can be easily detected and fixed with slight human intervention. In these cases, total no human intervention may not be possible but the amount needed can be significantly reduced.

    Malware Detection in Portable Document Format (PDF) Files with Byte Frequency Distribution (BFD) and Support Vector Machine (SVM)

    Get PDF
    Portable Document Format (PDF) files as well as files in several other formats such as (.docx, .hwp and .jpg) are often used to conduct cyber attacks. According to VirusTotal, PDF ranks fourth among document files that are frequently used to spread malware in 2020. Malware detection is challenging partly because of its ability to stay hidden and adapt its own code and thus requiring new smarter methods to detect. Therefore, outdated detection and classification methods become less effective. Nowadays, one of such methods that can be used to detect PDF files infected with malware is a machine learning approach. In this research, the Support Vector Machine (SVM) algorithm was used to detect PDF malware because of its ability to process non-linear data, and in some studies, SVM produces the best accuracy. In the process, the file was converted into byte format and then presented in Byte Frequency Distribution (BFD). To reduce the dimensions of the features, the Sequential Forward Selection (SFS) method was used. After the features are selected, the next stage is SVM to train the model. The performance obtained using the proposed method was quite good, as evidenced by the accuracy obtained in this study, which was 99.11% with an F1 score of 99.65%. The contributions of this research are new approaches to detect PDF malware which is using BFD and SVM algorithm, and using SFS to perform feature selection with the purpose of improving model performance. To this end, this proposed system can be an alternative to detect PDF malware

    Comparative Analysis Multi-Robot Formation Control Modeling Using Fuzzy Logic Type 2 – Particle Swarm Optimization

    Get PDF
    Multi-robot is a robotic system consisting of several robots that are interconnected and can communicate and collaborate with each other to complete a goal. With physical similarities, they have two controlled wheels and one free wheel that moves at the same speed. In this Problem, there is a main problem remaining in controlling the movement of the multi robot formation in searching the target. It occurs because the robots have to create dynamic geometric shapes towards the target. In its movement, it requires a control system in order to move the position as desired. For multi-robot movement formations, they have their own predetermined trajectories which are relatively constant in varying speeds and accelerations even in sudden stops. Based on these weaknesses, the robots must be able to avoid obstacles and reach the target. This research used Fuzzy Logic type 2 – Particle Swarm Optimization algorithm which was compared with Fuzzy Logic type 2 – Modified Particle Swarm Optimization and Fuzzy Logic type 2 – Dynamic Particle Swarm Optimization. Based on the experiments that had been carried out in each environment, it was found that Fuzzy Logic type 2 - Modified Particle Swarm Optimization had better iteration, time and resource and also smoother robot movement than Fuzzy Logic type 2 – Particle Swarm Optimization and Fuzzy Logic Type 2 - Dynamic Particle Swarm Optimization

    Automatic Clustering and Fuzzy Logical Relationship to Predict the Volume of Indonesia Natural Rubber Export

    Get PDF
    Natural rubber is one of the pillars of Indonesia's export commodities. However, over the last few years, the export value of natural rubber has decreased due to an oversupply of this commodity in the global market. To overcome this problem, it is possible to predict the volume of Indonesia natural rubber exports. Predicted values can also help the government to compile market intelligence for natural rubber commodities periodically. In this study, the prediction of the export volume of natural rubber was carried out using the Automatic Clustering as an interval maker in the Fuzzy Time Series or usually called Automatic Clustering and Fuzzy Logical Relationship (ACFLR). The data used is 51 data per year from 1970 to 2020. The purpose of this study is to predict the volume of Indonesia natural rubber exports and compare the prediction results between the Automatic Clustering and Fuzzy Logical Relationship (ACFLR) and Chen's Fuzzy Time Series. The results showed that there was a significant difference between the two methods, ACFLR got 0.5316% MAPE with  and Chen's Fuzzy Time Series model got 8.009%. Show that the ACFLR method performs better than the pure Fuzzy Time Series in predicting volume of Indonesia natural rubber exports

    Multiclass Segmentation of Pulmonary Diseases using Convolutional Neural Network

    Get PDF
    Pulmonary disease has affected tens of millions of people in the world. This disease has also become the cause of death of millions of its sufferers every year. In addition, lung disease has also become the cause of other respiratory complications, which also causes the death of the sufferer. The diagnosis of pulmonary diseases through medical imaging is a significant challenge in computer vision and medical image processing. The difficulty is due to the wide variety in infected areas' shape, dimension, and location. Another challenge is to differentiate one lung disease from the other. Discriminating pulmonary diseases is a notable concern in the diagnosis of pulmonary disease. We have adopted the deep learning convolutional neural network in this study to address these challenges. Seven models were constructed using the Mask Region-based Convolutional Neural Network (Mask-RCNN) architecture to detect and segment infected areas within the lung region from CT scan imagery. The evaluation results show that the best model obtained scores of 91.98%, 85.25%, and 93.75% for DSC, MIoU, and mAP, respectively. The segmentation results are then visualized

    Analisa Perbandingan Algoritma A* dan Dynamic Pathfinding Algorithm dengan Dynamic Pathfinding Algorithm untuk NPC pada Car Racing Game

    Get PDF
    Permainan mobil balap adalah salah satu permainan simulasi yang membutuhkan Non-Playable Character (NPC) sebagai pilihan lawan bermain ketika pemain ingin bermain sendiri. Dalam permainan mobil balap, NPC membutuhkan pathfinding untuk bisa berjalan di lintasan dan menghindari hambatan untuk mencapai garis finish. Metode pathfinding yang digunakan oleh NPC dalam game ini adalah Dynamic Pathfinding Algorithm (DPA) untuk menghindari hambatan statis dan dinamis di lintasan dan Algoritma A* yang digunakan untuk mencari rute terpendek pada lintasan. Hasil percobaan menunjukkan bahwa NPC yang menggunakan gabungan DPA dan Algoritma A* mendapatkan hasil yang lebih baik dari NPC yang hanya menggunakan Algoritma DPA saja, sedangkan posisi rintangan dan bentuk lintasan memiliki pengaruh yang besar terhadap DPA.Permainan mobil balap adalah salah satu permainan simulasi yang membutuhkan Non-Playable Character (NPC) sebagai pilihan lawan bermain ketika pemain ingin bermain sendiri. Dalam permainan mobil balap, NPC membutuhkan pathfinding untuk bisa berjalan di lintasan dan menghindari hambatan untuk mencapai garis finish. Metode pathfinding yang digunakan oleh NPC dalam game ini adalah Dynamic Pathfinding Algorithm (DPA) untuk menghindari hambatan statis dan dinamis di lintasan dan Algoritma A* yang digunakan untuk mencari rute terpendek pada lintasan. Hasil percobaan menunjukkan bahwa NPC yang menggunakan gabungan DPA dan Algoritma A* mendapatkan hasil yang lebih baik dari NPC yang hanya menggunakan Algoritma DPA saja, sedangkan posisi rintangan dan bentuk lintasan memiliki pengaruh yang besar terhadap DPA

    Perangkat Lunak Penganalisis Kemiripan Webpage Berdasarkan Konten Presentasional

    Full text link
    Sebuah webpage selain berisi sekumpulan informasi utama (konten) juga mengandung konten presentasional yang digunakan untuk menampilkan isi informasi utama. Pada sebuah website, konten presentasional sebuah webpage cenderung mirip dengan konten presentasional dalam webpage lainnya di website tersebut. Meskipun mirip ataupun identik, setiap kali sebuah webpage dimuat dalam browser konten presentasional ini tetap mengalami proses pemuatan ulang. Jika kemiripan konten presentasional cukup besar, maka akan terjadi banyak pemborosan konten yang dimuat dari server. Penelitian ini bertujuan untuk mengembangkan perangkat lunak yang dapat menganalisis kemiripan sekelompok webpage dalam sebuah website. Data yang digunakan adalah kumpulan webpage dari sebuah website yang diunduh menggunakan web crawler. Berdasarkan hasil analisis pada website www.pusbangdik.unsri.ac.id , didapatkan bahwa konten presesentasional dari masing-masing webpage cukup mirip, dengan rata-rata kemiripan 67% untuk semua webpage dan 58% untuk webpage yang terhubung saja

    PELATIHAN INSTALASI SERVER UJIAN BERBASIS KOMPUTER PADA SMK NEGERI 1 OGAN KOMERING ULU

    No full text
    One of the problems that might occurred when a school is going to hold Computer Based National Test (UNBK) is the inavailability of servers at the time of UNBK dissemination. During the dissemination process, everything is done only through simulation. This can cause problems, notably because the real world implementation could be significantly different than the simulation. SMK Negeri 1 Ogan Komering Ulu is one of the few high schools capable to hold UNBK aside of Paper Based Test. To intensify the preparation of UNBK, a Moodle system is installed on the school local server. It is expected to get the network administrators, students, and teachers ready for the future UNBK

    PENGELOLAAN KEUANGAN TERKOMPUTERISASI DI FAKULTAS ILMU KOMPUTER UNIVERSITAS SRIWIJAYA MENGGUNAKAN SISTEM ENTERPRISE RESOURCE PLANNING (ERP)

    No full text
    A university faculty has numerous and complex business processes that it can be assumed as a corporation. Implementing an Enterprise Resource Planning (ERP) system in a faculty has to integrate all the elements in a faculty, such as departments and other units. Integrated system will ease the monitoring and evaluation process in a faculty. This research analyzes the requirements needed to integrate between faculty and university finance system at Computer Science Faculty, Sriwijaya University. The result of requirement analysis is then represented in UML Diagrams and implemented in Web platform. Based on the implementation results, all of the requirements has been implemented accordingly. It is expected to streamlined the finance processes in Computer Science Faculty, as well as creating a more accountable and transparent financial environment
    corecore